The construction of Arabidopsis expressed sequence tag assemblies. A new resource to facilitate gene identification.

نویسندگان

  • S D Rounsley
  • A Glodek
  • G Sutton
  • M D Adams
  • C R Somerville
  • J C Venter
  • A R Kerlavage
چکیده

The generation of large numbers of partial cDNA sequences, or expressed sequence tags (ESTs), has provided a method with which to sample a large number of genes from an organism. More than 25,000 Arabidopsis thaliana ESTs have been deposited in public databases, producing the largest collection of ESTs for any plant species. We describe here the application of a method of reducing redundancy and increasing information content in this collection by grouping overlapping ESTs representing the same gene into a "contig" or assembly. The increased information content of these assemblies allows more putative identifications to be assigned based on the results of similarity searches with nucleotide and protein databases. The results of this analysis indicate that sequence information is available for approximately 12,600 nonoverlapping ESTs from Arabidopsis. Comparison of the assemblies with 953 Arabidopsis coding sequences indicates that up to 57% of all Arabidopsis genes are represented by an EST. Clustering analysis of these sequences suggests that between 300 and 700 gene families are represented by between 700 and 2000 sequences in the EST database. A database of the assembled sequences, their putative identifications, and cellular roles is available through the World Wide Web.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification and Expression Analysis of Two Arabidopsis LRR-Protein Encoding Genes Responsive to Some Abiotic Stresses

AbstractTwo Arabidopsis thaliana genes, psr9.2 and psr9.4 appearedto be highly similar to a phosphate-starved induced gene,psr9, isolated from Brassica nigra suspension cells.Sequence analysis classified the encoded polypeptides asmembers of leucine-rich repeat (LRR) proteins superfamily.The sequence of psr9 proteins comprise a unique N-terminalregion e...

متن کامل

Identification of Sequence Variation in the Apolipoprotein A2 Gene and Their Relationship with Serum High-Density Lipoprotein Cholesterol Levels

Background: Apolipoprotein A2 (APOA2) is the second major apolipoprotein of the high-density lipoprotein cholesterol (HDL-C). The study aim was to identify APOA2 gene variation in individuals within two extreme tails of HDL-C levels and its relationship with HDL-C level. Methods: This cross-sectional survey was conducted on participants from Tehran Glucose and Lipid Study (TLGS) at Research Ins...

متن کامل

The Arabidopsis root transcriptome by serial analysis of gene expression. Gene identification using the genome sequence.

Large-scale identification of genes expressed in roots of the model plant Arabidopsis was performed by serial analysis of gene expression (SAGE), on a total of 144,083 sequenced tags, representing at least 15,964 different mRNAs. For tag to gene assignment, we developed a computational approach based on 26,620 genes annotated from the complete sequence of the genome. The procedure selected warr...

متن کامل

Identification and Characterization of LHCB1 Co-Suppressed Line in Arabidopsis

To explore the function of light-harvesting complex protein (LHCP) in Arabidopsis growth and development, the Leclere and Bartel seed collection was screened. In this collection randomly cloned cDNAs are expressed under the CaMV35S promoter. A pale green line has been identified and characterized in more details. Analysis of the inserted cDNA in the pale green line showed it encodes LHCB1 prote...

متن کامل

Refined annotation of the Arabidopsis genome by complete expressed sequence tag mapping.

Expressed sequence tags (ESTs) currently encompass more entries in the public databases than any other form of sequence data. Thus, EST data sets provide a vast resource for gene identification and expression profiling. We have mapped the complete set of 176,915 publicly available Arabidopsis EST sequences onto the Arabidopsis genome using GeneSeqer, a spliced alignment program incorporating se...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Plant physiology

دوره 112 3  شماره 

صفحات  -

تاریخ انتشار 1996